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Abstract 

We present a novel particle filtering algo- 
rithm for tracking a moving sound source us- 
ing a microphone array. If there are N micro- 
phones in the array, we track all ( 2 ) delays 
with a single particle filter over time. Since 
it is known that tracking in high dimensions 
is rife with difficulties, we instead integrate 
into our particle filter a model of the low di- 
mensional manifold that these delays lie on. 
Our manifold model is based off of work on 
modeling low dimensional manifolds via ran- 
dom projection trees [5]. In addition, we also 
introduce a new weighting scheme to our par- 
ticle filtering algorithm based on recent ad- 
vancements in online learning. We show that 
our novel TDOA tracking algorithm that in- 
tegrates a manifold model can greatly out- 
perform standard particle filters on this audio 
tracking task. 



1 Introduction 

There is an increasing interest in locating audio sources 
with a microphone array as a means to direct the 
pointing of a camera. Camera pointing applications 
include video conferencing, surveillance, game playing 
and interactive displays. In addition, speech enhance- 
ment with microphone arrays rely critically on know- 
ing the correct source location. 

One popular method for locating an audio source is 
based on measuring the delays observed between spa- 
tially separated pairs of microphones known as the 
time delay of arrival (TDOA). For locating a source 
a two stage process can be employed: First TDOAs 
for all pairs of microphones are estimated, and then a 
source location is derived from this delay information. 
If microphone positions are given, the second step be- 
comes approximately solving a set of non-linear phys- 



ical equations such as in [6]. However, localizing an 
audio source accurately in a large room requires that 
the microphones are far apart from each other. As a 
result of placing the microphones far apart, it becomes 
difficult to estimate their positions within a coordinate 
system accurately. If the positions are not known, then 
a regressor can be learned that maps TDOAs to cam- 
era pointing directives as in [7J H] . 

In this work we focus on accurately estimating and 
tracking TDOAs for a microphone array in a large 
room. There is an extensive literature on using particle 
filters for tracking audio sources when the microphone 
positions are known [131 III] . Since positional infor- 
mation is known, the state space for the particles is 
typically only two or three spatial dimensions for the 
location of the sound source. When the microphone 
positions are not known and we attempt to track in 
the native TDOA space we become victim to the slew 
of problems that come with tracking in high dimen- 
sions. With N microphones in the array each pair has 
a TDOA that needs to be tracked making the state 
space be of dimension D = ( „) . D can be quite large 
for a microphone array in a large room. 

To alleviate the problem of high dimensionality we pro- 
pose an addition to the particle filter that includes a 
restriction on the state space of particles to that of a 
low dimensional manifold. Underlying the D dimen- 
sions of a TDOA measurement are only three degrees 
of spatial freedom for the sound source to move in. 
Each 3-d spatial location creates a unique TDOA vec- 
tor which varies smoothly with smooth variations in 
the spatial location. We model this low dimensional 
manifold using a tree-based spatial partitioning data 
structure combined with principal components analy- 
sis. Our tree structure is based on work on random 
projection trees, which have been shown to adapt to 
low dimensional intrinsic structure when the data itself 
lies in a high dimensional space [5]. 

We also investigate in this work a new particle filter 
based on work from the online learning body of liter- 



ature. In particular we focus on work from combining 
expert advice via the normal hedge algorithm [2] . For 
particle filters, each expert is itself a particle that pre- 
dicts a state at each time step. The normal hedge par- 
ticle filter gives both a new particle weighting scheme 
and a natural resampling scheme for particles based on 
the fact that the algorithm explicitly gives zero weight 
to poorly performing particles. Using normal hedge in 
the particle filtering framework has been initially ex- 
plored in [3J. This is the first time this algorithm has 
been applied to the TDOA tracking problem, and to 
the best of our knowledge, any practical problem to 
date. 

The rest of the paper is organized as follows. Section[2] 
briefly discusses how we estimate TDOAs for a given 
pair of microphones via the phase transform. Section[5J 
discusses random projection trees and how they adapt 
to low dimensional intrinsic structure. Section U dis- 
cusses our particle filter implementation that includes 
the model of the manifold. Finally in section [5] we dis- 
cuss some experiments on tracking TDOA vectors with 
real-world data collected from an interactive display. 

2 Time Delay of Arrival 

One very popular method for estimating a TDOA 
given frames of audio from a pair of microphones 
is to use a generalized correlation technique such as 
the phase transform otherwise known as PHAT [13) . 
PHAT is a normalized cross correlation technique that 
removes the magnitudes of the amplitude information 
from the audio signals putting the emphasize on align- 
ing the phase components. Define R p (t) as the PHAT 
correlation between microphone pair p at time delay 
r, then the TDOA is often estimated by 

A p = arg max R p (t) (1) 

r 

However, in a reverberant environment there are of- 
ten spurious peaks in R p from either line noise or 
multipath reflections. In these cases the true TDOA 
may not be the largest peak in the PHAT correlation. 
By using particle filters we are able to leverage this 
secondary peak information when formulating a likeli- 
hood function that incorporates the entirety of the ob- 
servation R p . This gives the particle filtering method 
a robustness over traditional approaches that depend 
on the accuracy of Equation (flj over all pairs p. 

3 Modeling the Manifold 

A TDOA vector has only three underlying spatial de- 
grees of freedom. If the microphone positions were 
known, then the physics equation for the TDOA be- 




Figure 1: A toy dataset whose distribution is shown 
in pink. A PDTree of height one is built with top 
principal direction shown in each leaf node. 

tween microphone pair p = is 

A p _ IK - s h - \\ m j - s h ^ 

where rrij is the position of microphone i, s is the 
source location and c is the speed of sound in air. In 
this work we assume no such knowledge of m^, but 
nevertheless the same physical principals apply. As 
s varies smoothly, so does A p . So even though the 
vector containing the TDOAs for all microphone pairs 
has D components, the real underlying dimensionality 
is only three. We call this lower dimensional smooth 
structure the TDOA manifold. 

Modeling the TDOA manifold for a particular array 
configuration is an integral part of the particle filter- 
ing algorithm we present in Section [4] Our model is 
based off of the random projection tree spatial parti- 
tioning algorithm whose details can be found in [5]. 
A random projection tree (RP-tree) is a binary tree 
that recursively splits a dataset into two subsets. It 
is constructed in nearly the same way as a KD-tree 
but instead of recursively splitting the dataset along a 
single coordinate axis, the data is first projected onto 
a random direction and then split near the median of 
these projections. RP-trees have been effectively used 
as a means for vector quantization and for regression 
problems when the data has much lower intrinsic di- 
mensionality than it's ambient dimension|10[ [8j . 

The intrinsic dimensionality of a dataset can be mea- 
sured in a variety of ways including Assouad dimension 
or fraction of variance explained by a PC A at the ap- 
propriate neighborhood size. RP-trees guarantee that 
if the data falling in a given node n of the tree has 
intrinsic dimensionality d, then all cells O(d) levels 
below n have at most half the data diameter. This 
guarantee depends only on the intrinsic dimensional- 
ity of the data d and not the ambient dimensionality 
D. Therefore, we can expect a rapid convergence to 
the manifold structure from such a partitioning tree. 



To model the TDOA manifold we first collect a train- 
ing set of TDOA vectors sampled from the room con- 
taining our fixed microphone array. This can be done 
by using a white noise source and moving it through- 
out the room. Since white noise is random, the TDOAs 
measured via PHAT using Equation [T] are very reliable 
training data after some simple outlier removal. An- 
other way to collect such a training set is from inter- 
actions by people with an interactive display as in [3]. 

The tree we build in this work is similar to an RP-tree 
but uses principal components analysis instead of ran- 
dom projections. We call this tree a PD-tree and it has 
been shown empirically that these trees adapt to in- 
trinsic dimensionality well in practice [12]. A PD-tree 
recursively partitions the training set by projecting the 
data onto its top principal direction and then choos- 
ing the median of these projections to be the splitting- 
point. A depiction of a PD-tree of height 1 on a toy 
dataset is given in Figure [T] We find that in prac- 
tice using the top principal direction lends to quicker 
convergence to the underlying manifold compared to 
using random directions. 

At each node of the PD-tree we store the mean and top 
k principal directions of the data that belongs to the 
node. We use this tree as a means of denoising TDOAs. 
For a given TDOA vector, find the corresponding leaf 
node it belongs to and then project it onto the affine 
space spanned by the top k eigenvectors stored in that 
leaf node. This is effectively a projection onto the 
manifold where the manifold is modeled piecewise by 
PCAs of local neighborhoods. 

4 Particle Filters & Normal Hedge 

In this section we briefly describe a standard particle 
filtering algorithm as it relates to the TDOA tracking 
problem. We then introduce a new particle filtering al- 
gorithm with a new weighting and particle resampling 
scheme based on results from online learning. 

4.1 Particle Filtering Framework 

Particle filtering is an approximation technique used 
to solve the Bayesian filtering problem for state space 
tracking first proposed in [9]. For TDOA tracking, the 
state space X t is composed of each of the D time de- 
lays. A weighting over m particles is chosen to approx- 
imate the posterior density at time t over this state 
space. A good tutorial discussing particle filtering and 
its many variants can be found in [1] . 

One popular variant is the sampling importance re- 
sampling (SIR) particle filter. We examine this filter 
for our purposes since it has been shown to work well 
for audio tracking when a coordinate system is known 



Algorithm 1 SIR Particle Filter for TDOA Tracking 
Initial Assumptions: At time t-1, we have the fol- 
lowing: 

1. Set of m particles X\_ x for % G {1, . . . , m). 

2. A collection of PHAT correlation observations 
at time t Rtij) for each pair of microphones. 

3. Each particle's weight w\_ 1 , a discrete repre- 
sentation of the posterior Pr(X t -i\Ri-.t-i)- 

4. A likelihood function £(R t ,X t ) oc Pr{R t \X t ). 

5. A resampling variance parameter S r 

1: Resampling: Resample m new particles and add 
independent Gaussian noise 

X\ = X\ + n, 

where X\ is drawn according to {wt-i} from the 
set of particles at t — 1 and rii ~ N(0, E r ). 
2: Weight Update: Assign each particle a likeli- 
hood weight according to 

w\ = C{R\,Xl) 

Normalize weights so that they sum to 1. 
3: Prediction: Predict state according to the 
weighted average 

m 
i=l 



and can be used for the state representation [T^l HJ . 
A single iteration of such a SIR particle filtering algo- 
rithm for the TDOA tracking problem is given in Algo- 
rithm [1] At each time step the algorithm goes through 
a resampling, a prediction and an update stage. The 
key decisions for optimizing the performance of this 
TDOA tracking algorithm are: 

1. The choice of £(Rt,Xt), the likelihood function 
of the observation given the state. For a given 
state Xt the likelihood function measures how 
likely it is to have observed the PHAT correla- 
tion R t . This function should be chosen so that 
the likelihood function is largest when the coor- 
dinates of state X t is nearby many of the peaks 
in each of the corresponding R t . However, mod- 
eling the true likelihood of the PHAT observation 
given the state is problematic since it is affected 
by issues such as line noise and multipath reflec- 
tions. This makes accurately modeling this like- 
lihood rather challenging, and instead a pseudo- 



likelihood is employed. 

2. The total number of particles m. The larger m is 
the more computational load the system must un- 
dertake. Minimizing m while not sacrificing per- 
formance is of paramount importance for real time 
implementations. 

3. The covariance of the resampling noise, S r . We 
assume a very simple model for the state space in 
what follows, namely that sound sources do not 
move too quickly. We should choose the size of E r 
to match how quickly we expect sound sources to 
be moving. More expressive state spaces that take 
into account the velocity or higher order moments 
of each TDOA coordinate are not explored in this 
work. 

We integrate the manifold modeling discussed in the 
previous section at the resampling stage. That is, af- 
ter resampling a new particle it can be denoised by 
projecting it through the trained tree model. This 
will disallow particles to drift off into regions where 
TDOAs can not be created by true sound sources. 

4.2 Normal Hedge Particle Filtering 

To discuss the differences between the SIR particle fil- 
ter and the normal hedge version we must first intro- 
duce some terminology from the online learning body 
of literature. Normal hedge is an online learning algo- 
rithm that attempts to learn how to combine predic- 
tions from experts at each time step so as to compete 
with the predictions of the best set of experts in the 
collection. 

The algorithm maintains a distribution over the ex- 
perts w\. At each time step each expert suffers a 
bounded loss l\ which is a function of the observa- 
tion and the experts prediction at time t, typically 
squared, absolute or log-loss. Finally, the algorithm 
suffers the loss J2i w t^t- The cumulative loss at time 
t for expert i is then L\ = J2l=i (cumulative loss 
for the algorithm, Lf is similar). Often the goal of 
an online learning algorithm is to maintain a distribu- 
tion such that Lf is small relative to that of the best 
expert in the set, miniLJ. Instead of competing with 
the best expert in hindsight, normal hedge attempts 
to compete with the top e-quantile of L\. This setting 
is useful when the number of experts is very large and 
it is expected that many of the experts will perform 
very similarly. 

A key concept in online learning is the regret at time t 
of the algorithm Rf = Lf — L\ to a particular expert 
i. The theoretical guarantee of normal hedge is that 
the algorithm's regret at time t to the |_eA"J-best ex- 
pert is small. This is not as strong as the regret to the 



Algorithm 2 NH Particle Filter for TDOA Tracking 
Initial Assumptions: At time t-1, we have the fol- 
lowing: 

1. Set of m particles X\_ x for % G {1, . . . , m). 

2. A collection of PHAT correlation observations 
at time t Rt{i~) for each pair of microphones. 

3. Each particle's weight wl_ 1 . 

4. A scoring function C for how well X t matches 
the observation Rt. 

5. A resampling variance parameter S r 

1: Weight Update: Update the discounted cumu- 
lative regret of each particle and each particle's 
weight using ([J])-©. Normalize weights so that 
they sum to 1. 

2: Prediction: Predict the state according to the 
weighted average. 

m 

i=i 

3: Resampling: For each particle with zero weight, 
resample a new particle 

x\ = x\ + m 

where X\ is drawn according to {wt-i} from the 
set of particles at t — 1 and rn ~ N(0, S r ). Also, 
reassign the cumulative regret to be the same as 
that of X\. 



best expert in hindsight being small, but is very ap- 
plicable when an e fraction of experts in fact predict 
well. We will exploit this fact in our tracking prob- 
lem. In addition, unlike many other online learning 
algorithms which have a learning rate parameter that 
controls how aggressive the w\ updates are made, nor- 
mal hedge has no such parameter to tune. A detailed 
explanation of normal hedge in the online setting can 
be found in [2]. 

Normal hedge is easily adapted to the problem of 
tracking with particle filters. Here the experts pre- 
dict a state at each time step, exactly the same as 
what a particle does in SIR particle filtering. At each 
time step the experts suffer a loss which is based on 
the same likelihood function C(R t ,X t ) as discussed for 
particle filters. Instead of calculating the cumulative 
loss of each expert, we maintain the discounted cumu- 
lative regret. 



G\ = (1 - a)GU + (C{R U XU) - gf) (3) 

m 

~ UCiR^XU) (4) 



9t 



i=i 



Where C is the likelihood scoring function used in the 
generic particle filtering algorithm, gf is the weighted 
likelihood of all the particles, and a is the discounting 
factor. The second term in @ is the instantaneous 
regret between the algorithm and the i th expert. The 
choice of a determines how long the memory is for the 
discounted cumulative regret, which determines how 
far back a particle must suffer for mistakes in the past. 
Given G\ for each particle, we use the normal hedge 
weighting update to determine each particle's weight. 
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where e is Euler's number. Note that the weighting is 
very aggressive since it is doubly exponential in G\ . A 
more in depth discussion of the normal hedge particle 
filter can be found in [3J. An instantiation of such an 
algorithm for the TDOA tracking problem is given in 
Algorithm [2] 

There are a few things to note about this algorithm. 
First, the resampling scheme for particles is built into 
the normal hedge framework since particles get as- 
signed zero weight when they have a non-positive dis- 
counted cumulative regret. Therefore, when an itera- 
tion occurs where a particle is found to have weight 
zero, a resampling step is undertaken that replaces 
it near a particle that currently is performing better 
than the algorithm's cumulative regret. This leads to a 
very natural resampling scheme that undergoes much 
less sampling per iteration than the SIR particle filter 
which resamples every particle every iteration. 

The second thing to note is that there are no proba- 
bilistic assumptions about £. The only requirement is 
that the user provide a scoring function, denoted C by 
which the particles are judged by, but unlike SIR par- 
ticle filters it need not be an accurate representation of 
the true likelihood. The introduction of a scoring func- 
tion to which performance can be guaranteed makes 
for a strong match with practical considerations. 

4.3 Choice of Scoring Function 

What remains to be discussed is how we define our 
likelihood (scoring) function C. It is difficult to ac- 
curately define the likelihood of an observation of a 



group of PHAT correlations given a particular state. 
Instead, we define a pseudo-likelihood, C. We'd like C 
to be large when the state is near large peaks in the 
PHAT correlations series. Moreover, we would like to 
encourage the particles to track these peaks over time, 
so they should be attracted in the direction of these 
peaks as well. 

To identify the peaks in a particular PHAT function 
we take a simple z-scoring method. For each PHAT 
correlation R\ let it undergo a z-scoring transform as 
follows: 



P 



C 



(7) 



where /ij, erf are the mean and standard deviation of 
R% over a fixed bounded range of r, and C is a constant 
requiring that peaks be at least C standard deviations 
above the mean. This performs well to find a fixed 
small number, Kf of peaks in each R\ since PHAT 
sequences typically have a small number of very large 
peaks relative to the rest of the series. Now we define 
a pseudo-likelihood function as follows: 



£{Zt,x t ) — z$ 



D Kl 

EE- 

p=i i=i 



Z!(Tf)Af(T?;Xi,a 2 z ) (8) 



where Xf is the TDOA for pair p for this state, 
Af(x; /it, ct 2 ) is the density under a normal distribution 
evaluated at x with mean \i and variance ct 2 , and Zf 
has Kf non-zero entries each of which are at if. The 
parameter Zq is the background likelihood that deter- 
mines how much likelihood is given to any state. The 
variance parameter ct 2 controls how much weighting is 
given relative to how far each state is from the peaks 
in the corresponding PHAT series. A similar pseudo- 
likelihood function is given in |13j . 

4.4 Integrating the Manifold Model 

The manifold modeling from Section |3j is integrated 
into both particle filtering algorithms very easily after 
the resampling stage. A final step is added after re- 
sampling a particle to denoise it so that it lies on the 
model of the manifold. The leaf node in the PD-tree 
that corresponds to the state of the particle is found. 
To denoise it, the particle is then projected onto the 
affine space spanned by the top k PCA components 
stored in this leaf. 

In the experiments that follow we explore several man- 
ifold models: 

1. No manifold modeling: no projection is per- 
formed after the resampling step. 

2. Fixed depth manifold modeling: We grow 




Figure 2: Performance of NH and PF with and without using a global PCA projection for denoising. 



each PD-tree to a fixed depth and use the leaf 
nodes at this depth as the manifold model. 

3. Randomized manifold modeling: We grow 
the tree to a fixed depth and we examine the 
path from root to leaf node the particle takes in 
the PD-tree. We then choose one of the nodes 
along this path uniformly at random to be the 
node which we use for the projection. We hope 
this randomized model has the ability to adapt 
over time to which levels of the tree are currently 
best at modeling the position of the sound source 
being tracked. 

5 Experiments 

5.1 Experimental Setup 

Recordings were made at 16 kHz on a 7 microphone 
array that is part of an interactive display placed in a 
large public lobby. The room is approximately 10m x 
13m x 5m in size. Four of the microphones are placed 
at the corners of the display which is mounted on one 
of the walls in the room, and the three remaining mi- 
crophones are placed on the ceiling of the room. For 
more details of the microphone setup and the room 
see [HE]. 

To build a PD-tree we first collected a training set of 
TDOA vectors from our microphone array. We accom- 
plished this by moving a white noise producing sound 
source around the room near typical locations that sit- 
ting or standing people would be interacting with the 
display. This resulted in approximately 20000 training 
TDOA vectors to which we built a PD-tree of depth 2. 
In each node of the PD-tree we store the mean of the 
training data and the top k=3 principal directions. 



Here are the parameter settings we use for the exper- 
iments that follow. We use m=50 particles for each 
type of particle filter examined. Our frame size is 500 
ms with an overlap of 25 ms. We set S r = -Id, where 
r is the sampling rate. The discounting factor for NH 
is set to a = 0.05, and the parameters of Equation (jHJ 
are <j\ = 10 and Zq — 1. 

We made several real audio recordings of a person 
walking throughout the room facing the array and 
talking. We describe each experiment in detail in what 
follows. 

5.2 Usage of Manifold Modeling 

This first experiment has a person walking and count- 
ing aloud while facing the array. The person's path 
goes through the center of the room far from each mi- 
crophone. Since TDOAs evolve more slowly when the 
sound source is far from each microphone we'd expect 
this to be well modeled by the root PCA of our PD- 
tree. Here we compare using the root PCA of our 
PD-tree versus no projection step at all for both SIR 
particle filters (PF) and the normal hedge particle fil- 
ters (NH). 

Figure [2] depicts such a comparison. Here we show 
tracking results from two microphone pairs that are 
typical of the remaining pairs. In green is shown Zf 
where its magnitude is represented by the size of the 
circle marker. The sound source moved in a continuous 
and slowly moving path so we'd expect each TDOA 
coordinate to follow a continuous and slowly changing 
path as well. The trackers with the PCA projection 
step are able to follow the sound source, while the 
versions without the projection lose the source quickly. 

Remember that there are only 50 particles to track a 
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Figure 3: Using various depths in the PD-tree as part of the projection step. 



state that is 21 dimensional. There are no dynamics 
involved in our particle filters, so the resampling stage 
alone has to include enough randomness for the source 
to be tracked as it moves. When the manifold model is 
not used the amount of randomness needed is to large 
for 50 particles to be able to track on all D dimensions. 
However, when a model of the manifold is used effec- 
tive tracking results can be had. Moreover, it should 
be noted that the normal hedge version uses less ran- 
domness since it only resamples when the weight of a 
particle becomes zero. Despite this, the normal hedge 
versions are able to have a competitive performance 
with SIR particle filters with much less randomness 
being used. 

5.3 Testing Different Manifold Models 

The setup of this experiment is exactly the same as the 
last except the path the speaker took traveled much 
closer to some pairs of microphones at certain points 
in time. When a sound source is moving close to some 
set of microphones, the TDOAs involved with those 
microphones will change much more rapidly and in a 
much more non-linear way. With this path we hope to 
examine the usefulness of deeper nodes in the PD-tree. 
Since the performance of PF and NH are comparable 
when using the global PCA projection we only exam- 
ine NH in this experiment. 

Figure [3] is a similar figure to that discussed in the pre- 
vious section. The particle filtering variants examined 
here use projections at fixed depth zero (NH-0), one 
(NH-1), and two (NH-2). The random strategy dis- 
cussed in Section 14.41 is also examined (NH-rand) . It 
is clear that somewhere between 50-70s. the location 
of the sound source is modeled poorly by the global 
PCA at the root and is better modeled by the PCA 



at level 2. However, it is only for this short duration 
where this modeling transition takes place. Depth's 
and 1 performed particularly poorly in this region, 
while depth 2 seems to have a significant advantage. 

However, the best performing tracker was one that uti- 
lized the entire tree structure in a random fashion. By 
allowing particles to die and birth randomly, there was 
a clear pressure to transition from a depth-0 model to 
a depth-2 model rather quickly by NH-rand. This can 
be seen in Figured Here we depict what proportion of 
the 50 particles at time t were last sampled from which 
depth by a stacked bar graph. There is a clear pref- 
erence for transitioning towards depth-2 at this par- 
ticular time period. Nearly all the particles during 
this time period that were sampled from depth-2 are 
staying alive during this period. 

This is a rather intuitive result since a particular 
node's PCA model may only be good for tracking in a 
small region of the entire 21 dimensional space that its 
PD-tree node represents. When the sound source exits 
this region, some other depth in the tree may become 
a better model. Using the randomness over time by 
NH-rand naturally captures such transitions. 

Figure [5] shows a sound source moving at constant 
speed a back-and-forth sweeping path. Each sweep 
starts beyond one side of the display and continues 
across and past the opposite end of the display. This 
is repeated at various distances away from the dis- 
play. The TDOA vectors predicted by NH-rand are 
projected on the top 2 principal components of the 
root PCA. Colors indicate time, dark blue being the 
earliest part of the path that started approximately 
lm from the display and red is the last segment of the 
path approximately 12m away. The change in TDOAs 
is greatest when near the microphones on the display 
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Figure 4: For NH-rand, the PD-tree depths at time t 
that the m particles have been sampled from last. 

which results in a wide spacing of points. The mark- 
ers indicate which of the 3 depths the majority of the 
NH-rand particles were last sampled from. In the cen- 
ter of the room its clear that the root-PCA performs 
best, whereas near the display on the right side depth 
2 dominates, and far from the display depth 1 is best. 

6 Conclusion 

In this work we examine particle filtering methods for 
tracking the TDOA vectors for moving sound sources. 
This is an essential problem to solve for audio localiza- 
tion and sound enhancement applications. We present 
a model of the manifold based on space partitioning 
trees that alleviates the problem of high dimensional 
tracking with particle filters. We also present a new 
version of a particle filter based on results from online 
learning that is competitive with traditional particle 
filters on this task and has properties that are attrac- 
tive to many real world problems. 

References 

[1] M. Arulampalam, S. Maskell, N. Gordon, and 
T. Clapp. A tutorial on particle niters for on- 
line nonlinear/non-Gaussian Bayesian tracking. IEEE 
Transactions on signal processing, 50(2):174-188, 
2002. 

[2] K. Chaudhuri, Y. Freund, and D. Hsu. A parameter- 
free hedging algorithm. In Advances in Neural Infor- 
mation Processing Systems 22, pages 297-305. 2009. 

[3] K. Chaudhuri, Y. Freund, and D. Hsu. Tracking us- 
ing explanation-based modeling. Technical report, UC 
San Diego, 2009. 

[4] S. Cheamanunkul, E. Ettinger, M. Jacobsen, P. Lai, 
and Y. Freund. Detecting, tracking and interacting 




-200 -100 100 200 300 



1st Principal Direction 

Figure 5: Sweeping path for NH-rand on top 2 princi- 
pal directions of root PCA. 

with people in a public space. In ICMI-MLMI '09: 
Proceedings of the 2009 international conference on 
Multimodal interfaces, pages 79-86, 2009. 

[5] S. Dasgupta and Y. Freund. Random projection trees 
for vector quantization. Information Theory, IEEE 
Transactions on, 55(7):3229-3242, July 2009. 

[6] J. DiBiase, H. Silverman, and M. Brandstein. Ro- 
bust localization in reverberant rooms. In Microphone 
arrays: signal processing techniques and applications, 
page 157, 2001. 

[7] E. Ettinger and Y. Freund. Coordinate-free calibra- 
tion of an acoustically driven camera pointing sys- 
tem. In Distributed Smart Cameras, 2008. ICDSC 
2008. Second ACM/ IEEE International Conference 
on, pages 1-9, Sept. 2008. 

[8] Y. Freund, S. Dasgupta, M. Kabra, and N. Verma. 
Learning the structure of manifolds using random pro- 
jections. In Advances in Neural Information Process- 
ing Systems 20. 2007. 

[9] N. Gordon, D. Salmond, and A. Smith. Novel ap- 
proach to nonlinear/non-Gaussian Bayesian state es- 
timation. IEE proceedings. Part F. Radar and signal 
processing, 140(2):107-113, 1993. 

[10] S. Kpotufe. Escaping the curse of dimensionality with 
a tree-based regressor. In COLT '09: Proceedings of 
the 22nd annual workshop on computational learning 
theory, 2009. 

[11] E. Lehmann and A. Johansson. Particle filter 
with integrated voice activity detection for acoustic 
source tracking. EURASIP J. Appl. Signal Process., 
2007(l):28-28, 2007. 

[12] N. Verma, S. Kpotufe, and S. Dasgupta. Which spatial 
partition trees are adaptive to intrinsic dimension? In 
The 25th Conference on Uncertainty in Artificial In- 
telligence. 2009. 

[13] D. Ward, E. Lehmann, and R. Williamson. Parti- 
cle filtering algorithms for tracking an acoustic source 
in a reverberant environment. IEEE Transactions on 
Speech and Audio Processing, 11:826-836, 2003. 



